Index-Based Approximate XML Joins

نویسندگان

  • Sudipto Guha
  • Nick Koudas
  • Divesh Srivastava
  • Ting Yu
چکیده

XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in the data sets. In this paper we study the problem of integrating XML data sources through index assisted join operations, using notions of approximate match in the structure and content of XML documents as the join predicate. We show how a well known and widely deployed index structure, namely the R-tree, can be adopted to improve the performance of such operations. We propose novel search and join algorithms for R-trees adopted to index XML document collections. We also propose novel optimization objectives for R-tree construction, making R-trees better suited for this application.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Embedding Similarity Joins into Native XML Databases

Similarity joins in databases can be used for several important tasks such as data cleaning and instance-based data integration. In this paper, we explore ways how to support such tasks in a native XML database environment. The main goals of our work are: a) to prove the feasibility of performing tree similarity joins in a general-purpose XML database management system; b) to support stringand ...

متن کامل

Index XML Data Using Extended Order and Path Index

The eXtensible Markup Language (XML) is becoming a new standard for information representation and exchange over the Internet. How to index XML data for efficient query processing and XML transformation is an important subject in the XML community. In this paper, based on extended preorder indexing method, we add path information as part of the index. It is shown that the number of path joins c...

متن کامل

Holistic Twig Joins on Indexed XML Documents

Finding all the occurrences of a twig pattern specified by a selection predicate on multiple elements in an XML document is a core operation for efficient evaluation of XML queries. Holistic twig join algorithms were proposed recently as an optimal solution when the twig pattern only involves ancestordescendant relationships. In this paper, we address the problem of efficient processing of holi...

متن کامل

Approximate Geospatial Joins with Precision Guarantees

Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an approximate geospatial join that guarantees a user-...

متن کامل

Indexing Schemes for Efficient Aggregate Computation over Structural Joins

With the increasing popularity of XML as a standard for data representation and exchange, efficient XML query processing has become a necessity. One popular approach encodes the hierarchical structure of XML data through a node numbering scheme, thus reducing typical queries to special forms (structural, path, twig) of containment joins. In this paper we consider how using an index can facilita...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003